Hashing and Indexing: Succinct DataStructures and Smoothed Analysis

نویسندگان

  • Alberto Policriti
  • Nicola Prezza
چکیده

We consider the problem of indexing a text T (of length n) with a light data structure that supports efficient search of patterns P (of length m) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively. We use our succinct hash data structure to solve the k-mismatch search problem in 2n log σ+o(n log σ) bits of space with a randomized algorithm having smoothed complexity O((2σ)(logn)(logm+ ξ) + (occ+ 1) ·m), where σ is the alphabet size, occ is the number of occurrences, and ξ is a term depending on m, n, and on the amplitude of the noise perturbing text and pattern. Significantly, we obtain that for any > 0, for m large enough, ξ ∈ O(logm): our results improve upon previous linear-space solutions of the k-mismatch problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Algorithm Based on Improved Sparse Local Sensitive Hashing

In this article, we propose a new semantic hashing algorithm to address the new-merging problems such as the difficulty in similarity measurement brought by highdimensional data. Based on local sensitive hashing and spectral hashing, we introduce sparse principal component analysis (SPCA) to reduce the dimension of the data set which exclude the redundancy in the parameter list, and thus make h...

متن کامل

Comparison Of Modified Dual Ternary Indexing And Multi-Key Hashing Algorithms For Music Information Retrieval

In this work we have compared two indexing algorithms that have been used to index and retrieve Carnatic music songs. We have compared a modified algorithm of the Dual ternary indexing algorithm for music indexing and retrieval with the multi-key hashing indexing algorithm proposed by us. The modification in the dual ternary algorithm was essential to handle variable length query phrase and to ...

متن کامل

Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there arem features in total but only n = o( √ m) features are required to distinguish examples, with Ω(logm) training examples and reasonable settings, it is possible to obtain a goodmodel in a succinct representation using n log2 m n+o(m) bits, by using a pipeline of exi...

متن کامل

Image authentication using LBP-based perceptual image hashing

Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...

متن کامل

An adaptive hashing technique for indexing moving objects

Although hashing techniques are widely used for indexing moving objects, they cannot handle the dynamic workload, e.g. the traffic at peak hour vs. that in the night. This paper proposes an adaptive hashing technique to support the dynamic workload efficiently. The proposed technique maintains two levels of the hashes, one for fast moving objects and the other for quasi-static objects. A moving...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014